{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "ef3180e2-c04a-4423-a3ac-76e624e89b59",
   "metadata": {
    "id": "ef3180e2-c04a-4423-a3ac-76e624e89b59"
   },
   "source": [
    "<div align=\"center\">\n",
    "<h1><a href=\"https://github.com/peremartra/Large-Language-Model-Notebooks-Course\">LLM Hands On Course</a></h1>\n",
    "    <h3>Understand And Apply Large Language Models</h3>\n",
    "    <h2>Setting UP a NL2SQL project with Ollama using Llama-3.</h2>\n",
    "    <p>by <b>Pere Martra</b></p>\n",
    "</div>\n",
    "\n",
    "<br>\n",
    "\n",
    "<div align=\"center\">\n",
    "    &nbsp;\n",
    "    <a target=\"_blank\" href=\"https://www.linkedin.com/in/pere-martra/\"><img src=\"https://img.shields.io/badge/style--5eba00.svg?label=LinkedIn&logo=linkedin&style=social\"></a>\n",
    "    \n",
    "</div>\n",
    "<br>\n",
    "<hr>\n",
    "\n",
    "\n",
    "Ollama is one of simplest ans easiest to configure model Server that you can use in your development environmet.\n",
    "\n",
    "In this notebook you are going to use Models from Ollama and create a custom one abel to generate SQL.\n",
    "\n",
    "**This noteebooks needs to run in a system with ollama installed, not in google Colab, son install ollama in your computer and the you will be able to execute the notebook.** "
   ]
  },
  {
   "cell_type": "markdown",
   "id": "pZWdoIHht09X",
   "metadata": {
    "id": "pZWdoIHht09X"
   },
   "source": [
    "![My Image](https://github.com/peremartra/Large-Language-Model-Notebooks-Course/blob/main/img/Martra_Figure_6-18.jpg?raw=true)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ua74HQW7x3lm",
   "metadata": {
    "id": "ua74HQW7x3lm"
   },
   "source": [
    "You only need to download Ollama (https://ollama.com/) and follow the instructions in just few minutes your Ollama server will bve ready.\n",
    "\n",
    "With Ollama installed you just need to install de Python API and load it in the notebook."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "v4cbxkolzQg7",
   "metadata": {
    "id": "v4cbxkolzQg7"
   },
   "source": [
    "Once installed you can Pull the Modeles you are interested in using the command:\n",
    "**ollama pull < model_name >.**\n",
    "\n",
    "pere@Peres-MBP ~ % ollama pull llama3\n",
    "\n",
    "*pulling manifest\n",
    "pulling ef311de6af9d... 100% ▕████████████████████▏ 5.0 GB                         \n",
    "pulling 097a36493f71... 100% ▕████████████████████▏ 8.4 KB                         \n",
    "pulling 109037bec39c... 100% ▕████████████████████▏  136 B                         \n",
    "pulling 65bb16cf5983... 100% ▕████████████████████▏  109 B                         \n",
    "pulling 0c2a5137eb3c... 100% ▕████████████████████▏  483 B                         \n",
    "verifying sha256 digest\n",
    "writing manifest\n",
    "removing any unused layers\n",
    "success*\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "205185e9-028d-4ea3-900a-b7220d46dfec",
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "id": "205185e9-028d-4ea3-900a-b7220d46dfec",
    "outputId": "c531f079-8b2b-4021-a0a7-8fa764e3713d"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\u001b[?25l     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m0.0/75.6 kB\u001b[0m \u001b[31m?\u001b[0m eta \u001b[36m-:--:--\u001b[0m\r",
      "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m75.6/75.6 kB\u001b[0m \u001b[31m2.6 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
      "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m77.9/77.9 kB\u001b[0m \u001b[31m6.6 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
      "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m58.3/58.3 kB\u001b[0m \u001b[31m4.6 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
      "\u001b[?25h"
     ]
    }
   ],
   "source": [
    "pip install -q ollama"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "3c099f9e-8e47-4f57-935c-f3b3859d1354",
   "metadata": {
    "id": "3c099f9e-8e47-4f57-935c-f3b3859d1354"
   },
   "outputs": [],
   "source": [
    "import ollama"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "nJ6AUhjp0xb9",
   "metadata": {
    "id": "nJ6AUhjp0xb9"
   },
   "source": [
    "Now is necessary to setup a user_message and instruction for the prompt."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "8e1dd46c-d178-48a3-9be2-ab4686916226",
   "metadata": {
    "id": "8e1dd46c-d178-48a3-9be2-ab4686916226"
   },
   "outputs": [],
   "source": [
    "# Define the user message to send.\n",
    "user_message = \"What is the name of the best paid employee?\""
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a9819b12-f272-4bf0-9b1e-a5d7367b9ac6",
   "metadata": {
    "id": "834ac177-2c1b-4dae-ad80-37f74702c78d"
   },
   "source": [
    "model_instructions = \"\"\"\n",
    "Your task is to convert a question into a SQL query, given a SQL database schema.\n",
    "Adhere to these rules:\n",
    "- **Deliberately go through the question and database schema word by word to appropriately answer the question.\n",
    "- **Return Only SQL Code.\n",
    "- **Don't add any explanation.\n",
    "   ### Input\n",
    "   This SQL query generatd will run on a database whose schema is represented below:\n",
    "\n",
    "   create table employees(\n",
    "       ID_Usr INT primary key,-- Unique Id for employee\n",
    "       name VARCHAR -- Name of employee\n",
    "       );\n",
    "\n",
    "   create table salary(\n",
    "       ID_Usr INT,-- Unique Id for employee\n",
    "       year DATE, -- Date\n",
    "       salary FLOAT, --Salary of employee\n",
    "       foreign key (ID_Usr) references employees(ID_Usr) -- Join Employees with salary\n",
    "       );\n",
    "\n",
    "   create table studies(\n",
    "       ID_study INT, -- Unique ID study\n",
    "       ID_Usr INT, -- ID employee\n",
    "       educational_level INT,  -- 5=phd, 4=Master, 3=Bachelor\n",
    "       Institution VARCHAR, --Name of instituon where eployee studied\n",
    "       Years DATE, -- Date acomplishement stdy\n",
    "       Speciality VARCHAR, -- Speciality of studies\n",
    "       primary key (ID_study, ID_Usr), --Primary Key ID_Usr + ID_Study\n",
    "       foreign key(ID_Usr) references employees (ID_Usr)\n",
    "       );\n",
    "\"\"\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "id": "818c8d38-2b05-4b41-adae-879e79e229d7",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "SELECT e.name \n",
      "FROM employees e \n",
      "JOIN salary s ON e.ID_Usr = s.ID_Usr \n",
      "ORDER BY s.salary DESC LIMIT 1;\n"
     ]
    }
   ],
   "source": [
    "response = ollama.generate(model='myllamasql', \n",
    "                           system=model_instructions, \n",
    "                           prompt = user_message) \n",
    "print(response['response'])"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "60721acc-8de2-4b9e-b5d0-a794a469d329",
   "metadata": {},
   "source": [
    "As you noticed you are passing the user question and the model instructions in two different variables. This is due to the template that ollama use to call the model. \n",
    "\n",
    "{{ if .System }}<|start_header_id|>system<|end_header_id|>\n",
    "\n",
    "{{ .System }}<|eot_id|>{{ end }}{{ if .Prompt }}<|start_header_id|>user<|end_header_id|>\n",
    "\n",
    "{{ .Prompt }}<|eot_id|>{{ end }}<|start_header_id|>assistant<|end_header_id|>\n",
    "\n",
    "\n",
    "\n",
    "You can create a new model based in one existent and override the variables, to change how the model works. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "id": "adcfbe2a-f191-4ada-a88b-db096a9f9732",
   "metadata": {
    "id": "adcfbe2a-f191-4ada-a88b-db096a9f9732"
   },
   "outputs": [],
   "source": [
    "#Configuration file based in llama3 and using One Shot Sample. \n",
    "modelfile = \"\"\"\n",
    "FROM llama3\n",
    "\n",
    "MESSAGE user How Many employes we have with a salary bigger than 50000?\n",
    "MESSAGE assistant SELECT COUNT(*) AS total_employees FROM employees e INNER JOIN salary s ON e.ID_Usr = s.ID_Usr WHERE s.salary > 50000;\n",
    "\n",
    "PARAMETER repeat_penalty 1.2\n",
    "PARAMETER temperature 0.1\n",
    "\"\"\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "id": "18a2960e-a491-4276-954f-cf5764c7e5b1",
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 408
    },
    "id": "18a2960e-a491-4276-954f-cf5764c7e5b1",
    "outputId": "5ffc3e5b-4c4d-4c0f-c2dc-8b86850bfc79"
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'status': 'success'}"
      ]
     },
     "execution_count": 15,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "#Creating a new model that have diferent hyperparameters and One shot Sample \n",
    "ollama.create(model=\"llamasql\", modelfile=modelfile)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "id": "3d93f99c-37f5-4f22-adbb-d5511a568d84",
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 408
    },
    "id": "3d93f99c-37f5-4f22-adbb-d5511a568d84",
    "outputId": "cee22210-ebbe-4adf-b866-29d928a77bd8"
   },
   "outputs": [],
   "source": [
    "response = ollama.generate(model='llamasql',\n",
    "                           system=model_instructions,\n",
    "                           prompt = user_message)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "id": "1f05da49-39c1-431b-8963-aa5e2ccf7aad",
   "metadata": {
    "id": "1f05da49-39c1-431b-8963-aa5e2ccf7aad",
    "outputId": "dab921b6-0f1e-4264-cd83-75308d31abbc"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "SELECT e.name \n",
      "FROM employees e \n",
      "JOIN salary s ON e.ID_Usr = s.ID_Usr \n",
      "ORDER BY s.salary DESC LIMIT 1;\n"
     ]
    }
   ],
   "source": [
    "print(response['response'])"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "fea88fc9-e90d-4804-b23f-1aa18481a7db",
   "metadata": {},
   "source": [
    "Is posible to create a new model with a file  stored in a directory. \n",
    "\n",
    "Sample of file: \n",
    "_____________\n",
    "FROM llama3\n",
    "\n",
    "SYSTEM \"\"\" Your task is to convert a question into a SQL query, given a SQL database schema.\n",
    "Adhere to these rules:\n",
    "- **Deliberately go through the question and database schema word by word to appropriately answer the question.\n",
    "- **Return Only SQL Code. \n",
    "- **Don't add any explanation.\n",
    "   \n",
    "   This SQL query generatd will run on a database whose schema is represented below:\n",
    "\n",
    "   create table employees(\n",
    "       ID_Usr INT primary key,-- Unique Id for employee\n",
    "       name VARCHAR -- Name of employee\n",
    "       );\n",
    "\n",
    "   create table salary(\n",
    "       ID_Usr INT,-- Unique Id for employee\n",
    "       year DATE, -- Date\n",
    "       salary FLOAT, --Salary of employee\n",
    "       foreign key (ID_Usr) references employees(ID_Usr) -- Join Employees with salary\n",
    "       );\n",
    "\n",
    "   create table studies(\n",
    "       ID_study INT, -- Unique ID study\n",
    "       ID_Usr INT, -- ID employee\n",
    "       educational_level INT,  -- 5=phd, 4=Master, 3=Bachelor\n",
    "       Institution VARCHAR, --Name of instituon where eployee studied\n",
    "       Years DATE, -- Date acomplishement stdy\n",
    "       Speciality VARCHAR, -- Speciality of studies\n",
    "       primary key (ID_study, ID_Usr), --Primary Key ID_Usr + ID_Study\n",
    "       foreign key(ID_Usr) references employees (ID_Usr)\n",
    "       );\"\"\"\n",
    "\n",
    "MESSAGE user How Many employes we have with a salary bigger than 50000?\n",
    "\n",
    "MESSAGE assistant \"\"\"\n",
    "SELECT COUNT(*) AS total_employees\n",
    "FROM employees e\n",
    "INNER JOIN salary s ON e.ID_Usr = s.ID_Usr\n",
    "WHERE s.salary > 50000;\"\"\"\n",
    "\n",
    "MESSAGE user Return the names of the three people who have had the highest salary increase in the last three years.\n",
    "\n",
    "MESSAGE assistant \"\"\"\n",
    "SELECT e.name\n",
    "FROM employees e\n",
    "JOIN salary s ON e.ID_usr = s.ID_usr\n",
    "WHERE s.year >= DATE_SUB(CURDATE(), INTERVAL 3 YEAR)\n",
    "GROUP BY e.name\n",
    "ORDER BY (MAX(s.salary) - MIN(s.salary)) DESC\n",
    "LIMIT 3;\"\"\"\n",
    "\n",
    "PARAMETER repeat_penalty 1.2\n",
    "PARAMETER temperature 0.1\n",
    "_____________________________\n",
    "\n",
    "This file contains the system instructions, a pair of shots and two hyperparameters. To create the Model you can use the order: \n",
    "\n",
    "**ollama create llamanlsql -f < filename >**\n",
    "from your terminal. \n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "id": "7c35b45b-ad68-478d-9752-a5c506f15f47",
   "metadata": {
    "id": "7c35b45b-ad68-478d-9752-a5c506f15f47"
   },
   "outputs": [],
   "source": [
    "#Use this order to call the recent create model that contains the system message. \n",
    "#Notice that you don't need to pass the system, just the user message. \n",
    "response = ollama.generate(model='llamanlsql',\n",
    "                           prompt = user_message)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "fdb2bf9f-7bfe-4370-ac4b-b005560ae555",
   "metadata": {
    "id": "fdb2bf9f-7bfe-4370-ac4b-b005560ae555",
    "outputId": "81a33164-5ff1-4650-d9d6-ed8bd0932b37"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "SELECT e.name \n",
      "FROM employees e \n",
      "JOIN salary s ON e.ID_Usr = s.ID_Usr \n",
      "ORDER BY s.salary DESC LIMIT 1;\n"
     ]
    }
   ],
   "source": [
    "print(response['response'])"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "dcfc4001-a1c0-4ce1-a9d2-dfce427ccebd",
   "metadata": {
    "id": "c07a07bc-bc4b-4fff-93e8-f6a3e45283a3"
   },
   "source": [
    "# Conclusions\n",
    "Ollama is incredibly powerful for how easy it is to install and use. In just a few minutes of installation, four lines of code, and a configuration file, you've managed to have a Large Language Model running on your machine, configured to generate SQL.\n",
    "\n",
    "If you look among the models supported by Ollama, you'll find SQLCoder. It's a perfect model for generating SQL from natural language, and it's based on Llama 2. I encourage you to adapt the example notebook to this model.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "ca6de571-1ea6-40f0-8c03-14c3a1c38c00",
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "colab": {
   "provenance": []
  },
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.10.14"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
